arXiv: 1507.031 llv2 [math.DS] 10 Aug 2015 


Kernel Methods for Linear Discrete-Time Equations 

Fritz Colonius 

Institut fiir Mathematik, Universitat Augsburg, Augsburg/Germany 

Boumediene Hamzi 

Department of Mathematics, Kog University, Istanbul, Turkey. 


Abstract: Methods from learning theory are used in the state space 
of linear dynamical and control systems in order to estimate the system 
matrices. An application to stabilization via algebraic Riccati equations is 
included. The approach is illustrated via a series of numerical examples. 

Keywords: Reproducing Kernel Hilbert spaces, linear discrete-time 
equations, parameter estimation, linear control systems, identification, Ric¬ 
cati equations. 

1 Introduction 

This paper discusses several problems in dynamical systems and control, 
where methods from learning theory are used in the state space of linear 
systems. This is in contrast to previous approaches in the frequency domain 
[HE]. We refer to [6] for a general survey on applications of machine 
learning to system identification. 

Basically, learning theory allows to deal with problems when only data 
from a given system are given. Reproducing Kernel Hilbert Spaces (RKHS) 
allow to work in a very large dimensional space in order to simplify the 
underlying problem. We will discuss this in the simple case when the matrix 
A describing a linear discrete-time system is unknown, but a time series from 
the underlying linear dynamical system is given. We propose a method to 
estimate the underlying matrix using kernel methods. Applications are given 
in the stable and unstable case and for estimating the topological entropy 
for a linear map. Furthermore, in the control case, stabilization via linear- 
quadratic optimal control is discussed. 

The emphasis of the present paper is on the formulation of a number of 
problems in dynamical systems and control and to illustrate the applicability 
of our approach via a series of numerical examples. 
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The contents is as follows: In Section [2] the problem is stated formally 
and an algorithm based on kernel methods is given for the stable case. In 
Section [3] the algorithm is extended to the unstable case. In particular, 
the topological entropy of linear maps is computed (which boils down to 
computing unstable eigenvalues). In Section 0] identification of linear con¬ 
trol systems is considered and Section [5] discusses their stabilization. Here 
we insert the estimate of the system matrix (obtained via learning theory) 
into the relevant algebraic Riccati equation and study when this yields a 
stabilizing feedback. Every section contains several numerically computed 
examples (via MATLAB) illustrating the approach. Section [6] draws some 
conclusions from the numerical experiments. For the reader’s convenience 
we have collected in the appendix basic concepts from learning theory as 
well as some hints to the relevant literature. 

2 Statement of the problem 

Consider the linear discrete-time system 

x{k + 1) = Ax{k), (1) 

where A = [aij] € We want to estimate A from the time series 

a:(l) + r]i, ■ ■ ■, x{N) + rj]\f where the initial condition a:(0) is known and 
r]i are distributed according to a probability measure px that satisfies the 
following condition (this is the Special Assumption in [lOjl. 

Assumption The measure px is the marginal on A = M"' of a Borel 
measure p on A x M with zero mean supported on [—Mj,,Ma,],Mj, > 0. 

One obtains from ([ll) for the components of the time series that 

n 

Xi{k + 1) = '^aijXj{k). (2) 

j=i 

For every i we want to estimate the coefficients aij,j = 1, • • • ,n. They are 
determined by the linear maps f* : MA M given by 

n 

(xi, ...,Xn) ^ (3) 

This problem can be reformulated as a learning problem as described in the 
Appendix where /* in ([3|) plays the role of the unknown function (f73]l and 
{x{k),Xi{k -|- 1) -|- Pi) are the samples in (17^ . 
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We note that in |10] . the authors do not consider time series and that 
we apply their results to time series. 

In order to approximate f*, we minimize the criterion in (I78p . For 
a positive definite kernel K let fi be the kernel expansion of f* in the 
corresponding RKHS T-Lk- Then fi = with certain coefficients 

Cij G M and 

°° c? ■ 

\\fi\\nK = '^^i ( 4 ) 

j=i f 

where {Xj,(j)j) are the eigenvalues and eigenfunctions of the integral operator 
Lk ■ C(‘T) given by {LKf){x) = f K{x,t)f{t)dv{t) with a Borel 

measure v on X. Thus Lxfij = Xjfj for j G N* and the eigenvalues Xj > 0. 

Then we consider the problem of minimizing over (ci^i, ••• ,Ci,N) the 
functional 

1 ^ 

£i = - fiix{k))f +ji\\fi\\'^^, (5) 

' k=l 

where yi{k) := Xi{k + 1) + rn = ff{x{k)) + r]i and 7 ^ is a regularization 
parameter. 

Since we are dealing with a linear problem, it is natural to choose the 
linear kernel k{x,y) = {x,y). Then the solution of the above optimization 
problem is given by the kernel expansion of Xi{k + 1 ), i = 1, ■ ■ • , n, 


N 

yi{k) := Xi{k + 1 ) = ^Cij{x{j),x{k)), 
i=i 

where the Cij satisfy the following set of equations: 


Xi{l) 

( \ 

Ql 


= iVA/rf + K 


_ Xi{N) _ 

V / 

Qiv 


with 


K : = 


YTi=iXi>{'\-)xi>{N - 1 ) 


TJi=iXe.{N)xi{Q) 
YJl=iXi{N)x(,{N - 1 ) 


This is a consequence of Theorem IA.21 


( 6 ) 


( 7 ) 


( 8 ) 
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From ([2]), we have 

N N N n 

^i(k +1) = E Cij{x{j),x{k)) = x{jf -xik) = EE CijXi{j)xe{k) 

j = l j = l j = l £=1 

n N 

= EE Cijxe{j)x£{k). 

e=i j=i 

Then an estimate of the entries of A is given by 

N 

au = '^Cijxe{j). (9) 

i=i 

This discussion leads us to the following basic algorithm. 

Algorithm A: If the eigenvalues of A are all within the unit circle, one pro¬ 
ceeds as follows in order to estimate A. Given the time series x(l), • • • , x{N) 
solve the system of equations ([7|) to find the numbers Cij and then compute 
an from 

Before we present numerical examples and modifications and applica¬ 
tions of this algorithm, it is worthwhile to note the following preliminary 
remarks indicating what may be expected. 

The stability assumption in algorithm A is imposed, since otherwise the 
time series will diverge exponentially. Then, already for a moderately sized 
number of data points {N « 10^) equation ([7|) will be ill conditioned. Hence 
for unstable A, modifications of algorithm A are required. 

While for test examples one can compare the entries of the matrix A and 
its approximation A, it may appear more realistic to compare the values 
x(l), • • ■ ,x{N) of the data series and the values x(l), • • • ,x(A) generated 
by the iteration of the matrix A. 

In general, one should not expect that increasing the number of data 
points will lead to better approximations of the matrix A. If the matrix A 
is diagonalizable, for generic initial points x(0) G M"' the data points x(k) 
will approach for A —oo the eigenspace for the eigenvalue with maximal 
modulus. For general A and generic initial points x(0) G M", the data points 
x(N) will approach for A ^ oo the largest Lyapunov space (i.e., the sum 
of the real generalized eigenspaces for eigenvalues with maximal modulus). 
Thus in the limit for A —oo, only part of the matrix can be approximated. 
A detailed discussion of this (well known) limit behavior is, e.g., given in 
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Colonius and Kliemann [1]. A consequence is that a medium length of the 
time series should be adequate. 

This problem can be overcome by choosing the regularization parameter 
7 in ([5]) and dT]) using the method of cross validation described in [ 8 ] . Briefly, 
in order to choose 7 , we consider a set of values of regularization parameters: 
we run the learning algorithm over a subset of the samples for each value of 
the regularization parameter and choose the one that performs the best on 
the remaining data set. Cross validation helps also in the presence of noise 
and to improve the results beyond the training set. 

A theoretical justification of our algorithm could be guaranteed by the 
error estimates in Theorem IA.51 In fact, for the linear dynamical system 
dU, we have that f* in (1731) is the linear map f*{x) = fi{x) in (l3|) and the 
samples s in (17^ are {x{k),Xi{k + 1) + r]i). Moreover, by choosing the linear 
kernel k{x,y) = {x,y) we get that f* G T-Lk- 

Next we discuss several numerical examples, beginning with the following 
scalar equation. 

Example 2.1. Consider x{k + 1) = ax{k) with a = 0.5. With the initial 
condition x(0) = —0.5, we generate the time series x(l), • • • , x(lOO). Apply¬ 
ing algorithm A with the regularization parameter 7 = 10“® we compute 
a = 0.4997. Using cross validation, we get that a = 0.5 with regularization 
parameter 7 = 1.5259-10~®. When we introduce an i.i.d perturbation signal 
r]i G [—0.1,0.1], the algorithm does not behave well when we fix the regular¬ 
ization parameter. With cross validation, the algorithm works quite well and 
the regularization parameter adapts to the realization of the signal r]i. Here, 
for e{k) = x{k) — x{k) with x{k -|- 1) = ax{k) and x{k -|- 1) = ax{k), we get 

that ||e(300)|| = = 0-0914 and = 1.8218-10-30. 

We observe an analogous behavior of the algorithm when the data are 
generated from x{k -|- 1 ) = ax{k) + ex{k)‘^ where the algorithm works well 
in the presence of noise and structural perturbations when using cross vali¬ 
dation. When e = 0.1 and with an i.i.d perturbation signal r]i G [-0.1,0.!], 
OL varies between 0.38 and 0.58 depending on the realization of r]i but 

l|e(300)|| = = 0.2290 and y^E?=ioo = 2-8098 - lO-^o 

which shows that the error e decreases exponentially and the generalization 
properties of the algorithm are quite good. 
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Example 2.2. Consider x{k + 1) = Ax{k) with matrix A given by 


■ -0.5 10 O' 

0 0.6 1 0 

■“ 0 0 0.7 1 

0 0 0 - 0.8 


( 10 ) 


For the initial condition x = [—0.9,0.1,15,0.2]' and with N = 100 data 
points, we get 


■ -0.5000 1.0000 0.0000 -0.0000 ■ 

. _ 0.0000 0.6000 1.0000 0.0000 

0.0000 -0.0000 0.7000 0.9994 

-0.0000 0.0000 -0.0000 -0.7995 


We then simulate x{k +1) = Ax{k) and x{k +1) = Ax{k) for /c = 0, • • • , 200 
to test the accuracy of our approximation beyond the interval /c = 0, • • • , 100. 
Then the norm of the error Cj (A:) = Xj{k)—Xj{k),iov j = I,-- - ,4, ||ej(300)|| = 

\J (^) is of the order of 10 “^ and y^X]i=ioo order of 

10 “^^ which shows that the error e decreases exponentially and the gen¬ 
eralization properties of the algorithm are quite good. The regularization 
parameters are 7 * = 0.9313 • 10“® for i = 1, • • • ,4. 

Also in the presence of small noise rn G [—0.01,0.01], the algorithm 
behaves well and the regularization parameters adapt to the realization of 
rji- For example, for a certain realizations of r]i, we obtain the regularization 
parameters 

71 = 0.0039,72 = 2.4114 • 10■^ 73 = 9.3132 • 10"^°,74 = 2 • 10“^ (12) 

and the error \ \ej (300)| ] = ®j(i) of order of 10“^ and 

is of the order of 10 “® . 

Suppose that in addition to a small noise r/i G [—0.01,0.01], there is a 
quadratic structural perturbation, i.e., 


xi{ky 

x{k + 1) = Ax{k) + s T^fkf 

X4{k)‘^ 


Then with cross validation for e = 0.001 the algorithm behaves well. For 
a particular realization of 7 , the error |jej(300)j| = ^(^) between 
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5 and 15 but is of the order of 10 ® and the regularization 

parameters are 

= 0 . 5,72 = 9.3132 • 10"^°, 73 = 9.3132 • 10“^°, 74 = 9.3132 • 10"^°. (14) 
These examples show a very good behavior of the algorithm. 


3 Unstable case 

Consider 

x{k + 1) = Ax{k) with A G (15) 

where some of the eigenvalues of A are outside the unit circle. Again, we 
want to estimate A when the following data are given, 

x(l),x(2),...,x(iV), (16) 

which are generated by system (fTSl) . thus x{k) = A^~^x{l). 

As remarked above, a direct application of the algorithm A will not work, 
since the time series diverges fast. Instead we construct a new time series 
from (1161) associated to an auxiliary stable system. 

For a constant a > 0 we define the auxiliary system by 



y{k + 1) = Ay{k) with A := —A. 

a 

(17) 

Thus 

vi^)={f) ^(1) 

(18) 

and with y{l) 

= x(l) one finds 



y{k) - ._^A^ ^ 2 ;( 1 ) - ^_^x{k). 

(19) 


If we choose cr > 0 such that the eigenvalues of ^ are in the unit circle, we 
can apply algorithm A to this stable matrix and hence we would obtain an 
estimate of ^ and hence of A. However, since the eigenvalues of the matrix 
A are unknown, we will be content with a somewhat weaker condition than 
stability of 

The data (fT6]) for system (fT^ yield the following data for system (fT71) : 

y(l) := x(l), y(2) := -x{2 ),..., y(iV) := ^x(iV). (20) 
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We propose to choose a as follows: Define 


Clearly the inequality a < ||yl|| holds. We apply algorithm A to the time 
series y{k). This yields an estimate of ^ and hence an estimate A of A. 

For general A, this choice of a certainly does not guarantee that the 
eigenvalues of ^ are within the unit circle. However, as mentioned above, 
a generic data sequence x{k),k G N, will converge to the eigenspace of 
the eigenvalue with maximal modulus. Hence will approach the 

maximal modulus of an eigenvalue, thus this choice of a will lead to a matrix 
— which is not “too unstable”. 

<7 

Example 3.1. Consider x{k + 1) = ax{k) with ol = 11.46. With the 
initial condition x(0) = —0.5, we generate the time series x(l), • • • ,x(100). 
The algorithm above with the regularization parameter 7 = 10“® yields the 
estimate d = 11.4086. Cross validation leads to the regularization parameter 
7 = 9.5367 • 10“^ and the estimate a = 11.4599. In the presence of a small 
noise rj G [—0.1,0.!], cross validation yields the regularization parameter 
7 = 0.002 and the slightly worse estimate d = 11.1319. 


We observe the same behavior in higher dimensional systems where the 
eigenvalues are of the same order of magnitude. 

Example 3.2. Consider x{k + 1) = Ax{k) with 


A = 


20 0 

0 -10 
0 0 

0 0 


0 

0 

15 

0 


0 

0 

0 

-25 


( 22 ) 


Using cross validation, we get that 



■ 20.0000 

0.0000 

0.0001 

0.0000 

A = 

- 0.0000 

- 10.0000 

0.0000 

- 0.0000 

0.0000 

- 0.0000 

14.9998 

0.0000 


- 0.0000 

- 0.0000 

- 0.0000 

-25.0003 

0.9313 • 10-9, i = l,- 

•• ,4. 




(23) 


For different realizations of a noise r/i of magnitude 0.5 • 10 cross 
validation gives a good approximation of A and the eigenvalues of A — A are 



all within the unit disk with amplitude of the order of 10 “^ showing that 
the dynamics of the error e{k) = x{k) — x{k) is asymptotically stable. For 
example, for a particular realization of r/j of magnitude 0.5 • 10“^, we get 


A 


19.9635 

-0.0177 

-0.0177 

-0.0132 


0.0086 

-10.0025 

-0.0025 

-0.0167 


0.1365 

0.0379 

15.0376 

0.0065 


-0.0007 

-0.0007 

-0.0007 

-25.0000 


(24) 


with regularization parameters 

71 = 1.9073-10"®, 72 = 9.3132-10"^°, 73 = 9.3132 -10"^°, 74 = 1.2207-10"^. 

(25) 

The algorithm fails in the presence of quadratic structural perturbations. 
This is due to the choice of a linear kernel. A polynomial kernel, for example, 
would allow for nonlinear perturbations but this would require a complete 
reformulation of our algorithm. We leave the extension of our algorithm to 
the nonlinear case for future work. 


The next example is an unstable system with a large gap between the 
eigenvalues. 

Example 3.3. Consider the system x(k + 1) = Ax{k) with 


A = 


20 0 
0 - 0.1 


(26) 


With the initial condition x(0) = [—1.9,1], we generate the time series 
x(l), - - - ,x(100). The algorithm above yields the (excellent) estimate 


20.0000 0.0000 

- 0.0000 - 0.1000 


(27) 


In the presence of noise of maximal amplitude 10“^ , the algorithm approx¬ 
imates well only the large entry an = 20: For a first realization of r]i and 
with cross validation, we get 


19.9997 -0.0111 
0.0000 -0.1104 ’ 


(28) 


with 7 i = 1.5259 - 10 ® and 72 


2^®. However another realization of rji 


leads to 


19.9994 -0.0011 
0.0000 - 0.0000 ’ 
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(29) 



with 7 i = 3.0518 • 10“^ and 72 = 2.8147 • 10^^. This is due to the fact 
that the data converge to the eigenspace generated by the largest eigenvalue 
A = 20. However, the eigenvalues of A—A are within the unit disk with small 
amplitude which guarantees that the error dynamics of e(k) = x{k) — x{k) 
converges to the origin quite quickly. We observe the same phenomenon 
with 


A = 


-0.5 0 

0 25 


(30) 


Here, in the absence of noise, we obtain the estimate 


-0.5000 0.0000 

-0.0000 25.0000 


(31) 


with 7 i = 72 = 0.9313 T0“®. In the presence of noise r/j with amplitude 10“^, 
the data converge to the eigenspace corresponding to the largest eigenvalue 
A = 25: for some realization of rji one obtains the estimate 


-0.4809 0.0008 

0.0164 24.9960 


while for another realization of rj 

A = 


- 0.0000 

-1.0067 


- 0.0000 

24.8696 


(32) 


(33) 


The regularization parameters 71 and 72 adapt to the realization of the 
noise. 


As already remarked in the end of Section [ 2 l we see that “more data” 
does not always necessarily lead to better results, since the data sequence 
converges to the eigenspace generated by the largest eigenvalue. However, 
whether with or without noise, the approximations of A are good enough to 
reduce the error between x(k + 1 ) = Ax{k) and x{k + 1 ) = Ax{k) outside 
of the training examples, since cross-validation determines a good regular¬ 
ization parameter 7 that balances between good fitting and good prediction 
properties. 

The next example has an eigenvalue on the unit circle. 

Example 3.4. Consider x{k -|- 1) = Ax{k) with 


A 


2.2500 -1.2500 
3.7500 -2.7500 
0 0 

0 0 


1.2500 -49.5500 

13.1500 -20.6500 
10.4000 -32.3000 
0 -21.9000 


(34) 
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The set of eigenvalues of A is spec(yl) = {—1.5000,1.0000,10.4000, —21.9000}. 
In the absence of noise and initial condition x = [—0.9,15,1.5.2.5] with 
N = 100 points, we compute the estimate 


2.2500 

-1.2500 

1.2498 

-49.5499 ■ 

3.7500 

-2.7500 

13.1498 

-20.6499 

0.0000 

0.0000 

10.3998 

-32.2999 

0.0000 

0.0000 

- 0.0001 

-21.8999 _ 


and regularization parameters 71 = 72 = 0.9313 -10 In this case, the set 
of eigenvalues of A is 


spec(i) = {-21.9000,10.3999, -1.5000,1.0000}. (36) 


For a given realization of 7 € [—10 ^,10 ^], we obtain the estimate 


2.2551 

-1.2490 

1.2187 

-49.5304 

3.7554 

-2.7489 

13.1175 

-20.6297 

0.0055 

0.0011 

10.3669 

-32.2794 

0.0053 

0.0010 

-0.0325 

-21.8797 


(37) 


with 71 = 0.0745 • 10“^ and 72 = 0.1490 • 10“'^. The eigenvalues oi A — A 
are of the order of 10 “^ which guarantees that the error dynamics converges 
quickly to the origin. However, the set of eigenvalues of A is 


spec(i) = {-21.8996,10.3999, -1.5026,1.0134}. (38) 


Hence an additional unstable eigenvalue occurs. 
Example 3.5. Consider x{k + 1) = Ax{k) with 


A = 


-0.8500 0.4500 -0.4500 
-1.3500 0.9500 14.3500 

0 0 15.3000 

0 0 0 


-77.8500 

-11.6500 

-55.3000 

-40.0000 


The eigenvalues of A are given by 


(39) 


spec(H) = {-0.4000,0.5000,15.3000, -40.0000}. (40) 


For an initial condition x = [ 


we get 

■ -0.8498 
, _ -1.3499 

0.0001 
-0.0004 


0.9; 15; 1.5; 2.5] and with N = 100 data points, 


0.4501 

0.9500 

0.0001 

- 0.0002 


-0.4499 

14.3501 

15.3001 

-0.0004 


-77.8504 

-11.6502 

-55.3004 

-39.9987 


(41) 
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with eigenvalues given by 

spec(i) = {-40.0000, -0.3974,0.4982,15.3008}. (42) 

Here we used 7 * = 10“^^, i = 1, • • • ,4. Moreover, the eigenvalues of A — A 
are quite small and such that the error dynamics converges quickly to the 
origin. In the presence of noise 7 , the algorithm approximates the largest 
eigenvalues of A but does not approximate the smaller (stable) ones. For 
example, for a particular realization of noise with amplitude 10 “^, we get 
the estimate 


A = 


- 2.1100 

-1.7053 

-0.8277 

-0.8283 


-0.0993 

0.7777 

-0.3692 

-0.3694 


-1.3259 

13.9397 

14.6466 

-0.6539 


-74.4543 

-10.5308 

-52.9920 

-37.6904 


(43) 


and spec(i) = {-40.0009,0.1620 ± 0.8438^ 15.3008}. 

For another realization of noise with amplitude 10“^, we get the estimate 


A = 


-138.0893 

-60.7052 

-105.8111 

301.5029 

-0.2435 

0.9101 

12.9638 

-12.6745 

-71.1408 

-31.9557 

-40.3842 

142.3170 

-71.1408 

-31.9557 

-55.6843 

157.6172 

-40.1391,3.9326,0.9601, 

15.3002}. 



(44) 


The algorithm introduced above also allows us to compute the topo¬ 
logical entropy of linear systems, since it is determined by the unstable 
eigenvalues. Recall that the topological entropy of a linear map on M"" is 
defined in the following way: 

Fix a compact subset K C M"", a time r G N and a constant e > 0. Then 
a set R C M” is called (r, e)-spanning for K if for every y £ K there is x G i? 
with 

^A^y — A^x^ < e for all j = 0, ...,r. (45) 

By compactness of K, there are finite (r, e)-spanning sets. Let ii be a (r, e)- 
spanning set of minimal cardinality = rYamiT,E, K). Then 

htop{K,A,£) := lim - log £, K),htop{K, A) := llm htop{K, e). (46) 

T^oo r £-;-0+ 

(the limits exist). Finally, the topological entropy of A is 


htopi^A^) •— sup/ijop(iL, j4), 
K 


(47) 
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where the supremum is taken over all compact subsets K of M". 

A classical result due to Bowen (cf. |17l Theorem 8.14]) shows that the 
topological entropy is determined by the sum of the unstable eigenvalues, 
i.e., 

htop{A) = y~]max(l, |A|), (48) 

where summation is over all eigenvalues of A counted according to their 
algebraic multiplicity. 

Hence, when we approximate the unstable eigenvalues of A by those of 
the matrix A, we also get an approximation of the topological entropy. 

Example 3.6. For Example 13.41 we get that htop{A) = 34.80 while for the 
estimate A one obtains htop{A) = 34.7999. For Example 13.51 we get that 
htop{A) = 55.30 and htop{A) = 55.3008. These estimates appear reasonably 
good. 

4 Identification of Linear Control Systems 

Consider the linear control system 

x{k + 1) = Ax{k) + Bu{h), (49) 

with A € and B € We want to estimate the matrices A and 

B from the time series x(l) + ryi, • • • , x{N) + r]N where r] satishes the As¬ 
sumption in Section [2j The initial condition x(0) and the control sequence 
tt(0), ■ ■ ■ ,u{N) are assumed to be known. 

In order to estimate A and B, we will extend algorithm A. The zth 
component of system (I49|) is given by 

n 

Xi{k + 1) = ^ aijXj{k) + biu{k). (50) 

For every i we want to estimate the coefficients bi and aij,j = 1, ••• ,n. 
Thus the linear map fi : M”" ^ M given by 

n 

{xi,...,Xn,u) ^'^aijXj + biU (51) 

i=i 

is unknown. To extend algorithm A, we will view system (1501) as a system 
of the form (l2|) where the state x is the extended state x = {x,u)£ M” x M 
for (I49|) . Hence, the kernel expansion ([6l) becomes 
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( 52 ) 


N 

Xi{k + 1 ) = '^Cij{x{j),x{k)) 

i=i 

where = u and the Cij satisfy the following set of equations: 


with 

K = 


Xi{l) 

( \ 



= N\U + K 


_ Xi{N) _ 

V / 

CiN 




(53) 


(54) 


Let us emphasize that u = x^+i does not appear on the left hand side of 

In reference to the case when A has eigenvalues outside the unit circle, 
we adopt the same method as in Section [3] and define 

u :=max|-iJ=^^|yjp,A: E {0, . (55) 

Example 4.1. (One Dimensional Case) Consider x{k + 1) = —0.9x{k) + 
3.5m. For an input u{k) = sin(A:) + cos{k) and for 100 points we obtain 
the estimate A = —0.9 and B = 3.5 when there is no noise rji. Here cross 
validation gives 71 = 1.5259 • 10“*^® and 72 = 1. For a certain realization of 
the noise 77 * with amplitude 0.1, we get A = —0.9008 and B = 3.4983. Here 
cross validation gives 71 = 0.0078 and 72 = 1. 


Example 4.2. (Three Dimensional Stable Case) Consider control system 
([4^ with 



■ -0.9 

1 

0 


■ -2.5 ■ 

A = 

0 

- 0.1 

1 

and B = 

-3.5 


0 

0 

0.8 


4.5 


(56) 


With the input u{k) = sin(A:) + cos(fe) and 100 points, one computes the 
estimates 


■ -0.9000 

1.0000 

0.0000 ■ 


■ -2.5000 ■ 

0.0000 

- 0.1000 

1.0000 

and B = 

-3.5000 

- 0.0000 

- 0.0000 

0.8000 


4.5000 


( 57 ) 
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Here cross validation gives the regularization parameters 7 ^ = 0.1526 • 10“^ 
for i = 1, • • • ,4. For some realization of perturbations rji with amplitude 
0 . 1 , one computes the estimates 


■ -0.9047 

0.9984 

-0.0029 ■ 


■ -2.5326 ■ 

-0.0047 

-0.1016 

0.9971 

and B = 

-3.5321 

-0.0048 

-0.0018 

0.7971 


4.4661 


Here cross validation gives 71 = 9.7656 • 10 72 = 9.7656 • 10 73 = 

1.5259 • 10-^ 74 = 4. 


Example 4.3. (Three Dimensional Unstable Case) Consider control system 
(|4^ with 


■ -20 

1 

0 


■ 1 ■ 

0 

1 

1 

and B = 

2 

0 

0 

20 


3 


(59) 


The input u{k) = sin(/i;) + cos(A:) and 100 points give the estimates 


■ -19.9945 

1.0009 

-0.0137 ■ 


■ 0.9898 ■ 

0.0013 

0.9995 

0.9919 

and B = 

1.9898 

0.0155 

-0.0171 

19.7835 


2.9333 


(60) 


Here cross validation yields the regularization parameters 7 ^ = 0.8882-10“^® 
for i = 1, • • • ,4. For some realization of perturbations r]i with amplitude 
10 “^, one computes the estimates 


■ - 20.0000 

0.9334 

-0.0058 ■ 


■ 0.9819 ■ 

-0.0008 

0.9382 

0.9939 

and B = 

1.9814 

-0.0008 

-0.0590 

19.9937 


2.9811 


(61) 


Here cross validation gives 71 = 72 = 0.2384 -10 73 = 74 = 0.0596 -10 ®. 


These results show that algorithm A works quite well in these cases. 


5 Stabilization via Linear-Quadratic Optimal Con¬ 
trol 

A basic problem for linear control systems is stabilization by state feedback. 
A standard method is to use linear quadratic optimal control, where the 
feedback is computed using the solution of an algebraic Riccati equation. 
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In this section, we propose to replace in the algebraic Riccati equation the 
system matrix A by the estimate A obtained by learning theory. 

The linear quadratic optimal control problem has the following form: 

Minimize over all (continuous) inputs u 


Joo(xo',u) = + u(k)~’^Ru(t) 

k=0 

with x(') given by 

x(k + 1) = Ax{k) + Bu{k), k > 0, x(0) = xq; 


(62) 


(63) 


here Q € is positive semidefinite and R G ]R»Tixm jg positive definite, 

Consider the discrete algebraic Riccati equation DARE 

A'^{P - PB{R + B'^PB)-^B^P)A + Q = P. (64) 


Obviously, every solution P is positive semi-definite. We cite the following 
theorem from [T]. 

Theorem. Suppose that for every xq € M” there is an input u, such 
that J{xo,u) < oo. Then the following holds: 

(i) There is a unique solution P of the DARE. 

(ii) Eor every xq G M” one has J*(xo) := inf{J(xo,u) | u an input} = 
Xq Pxq and there is a unique optimal input u* with J*(xo) = J(xo, u*). This 
optimal input is generated by the feedback F = {R + B^PB)~^B^PA and 

u{k) =—Fx{k),k > 0. (65) 

In particular, the feedback F stabilizes the system, i.e., x{k -|- 1) = (A — 
BF)x{k) is stable. 

Now we use an estimate A and B (obtained by kernel methods) instead 
of A and B in the algebraic Riccati equation and obtain the solution P. Will 
the corresponding feedback u = Fx := —B^Px also stabilize the system, 
i.e., is the following system stable: 

x{k + l) = {A-BB^P)x{k)7 (66) 

Example 5.1. Consider the one-dimensional system x(A:-|-l) = —0.9x(A:)-|- 
3.5u in Example 14.11 In the absence of noise, we get A = —0.9 and B = 3.5. 
We have that A — BF = A — BF = —0.0643. When there is noise of 
amplitude 0.1, we get that A = —0.9002 and B = 3.4929 and A — BF = 
—0.0643 while A — BF = —0.0610. Hence, the controller improves stability. 
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Example 5.2. Consider control system ()49ll with 


■ -0.9 

1 

0 


■ -2.5 ■ 

0 

-0.1 

1 

and B = 

-3.5 

0 

0 

0.8 


4.5 


( 67 ) 


As illustrated in Example 14.21 without noise we get excellent approximations 
of A and B. For both cases, the set of eigenvalues of the closed-loop system 
is {-0.6172,0.4049,-0.0018}. With a noise of maximal amplitude 0.1, the 
estimates A and B are given in Example 14.21 For the feedback system one 
finds 


spec(i - BF) = (-0.6204,0.4053, -0.0018}, 
spec(A - BF) = (-0.6240, -0.0062,0.4111}. 


In this example the feedback based on the estimate also stabilizes the original 
system. 

Example 5.3. Consider control system (I49p with 


■ -20 

1 

0 


■ 1 ■ 

0 

1 

1 

and B = 

2 

0 

0 

20 


3 


( 68 ) 


As Example 14.31 illustrates, without noise we get excellent approximations 
of A and B. For the feedback system one finds 


spec(i - BF) = (0.1994,0.0483, -0.0501}, 
spec(A - BF) = (-0.1234 ± 2.0777i, 0.5279}. 


When there is noise of amplitude 10 one computes the estimates 


■ -19.9805 

0.7484 

0.0135 


■ 1.0194 ■ 

-0.0062 

0.7969 

1.0107 

and B = 

2.0114 

-0.0229 

0.9851 

19.6776 


2.6673 


(69) 


This are bad approximations for A and B. Furthermore, for the feedback 
system one hnds 


spec(i - BF) = (0.1929,0.0477, -0.0501}, 
spec(A - BF) = (1.4510 ± 3.0103f, -2.5232}. 


Thus the stabilizing controller for the approximate system does not stabilize 
the true system. 
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6 Conclusions 


This paper has introduced the algorithm A based on kernel methods to 
identify a stable linear dynamical system from a time series. The numeri¬ 
cal experiments give excellent results in the absence of noise and structural 
perturbations. In the presence of noise and structural perturbations the 
algorithm works well in the stable case. In the unstable case, a modified 
algorithm works quite well in the presence of noise but cannot handle struc¬ 
tural perturbations. 

Then we have extended algorithm A to identify linear control systems. 
In particular, we have used estimates obtained by kernel methods to stabi¬ 
lize linear systems using linear-quadratic control and the algebraic Riccati 
equation. Here the numerical experiments seem to indicate that the same 
conclusions on applicability of the algorithm apply. 

Extensions of the considered algorithms to nonlinear systems appear 
feasible and are left to future work. 


A Appendix: Elements of Learning Theory 

In this section, we give a brief overview of Reproducing Kernel Hilbert Spaces 
(RKHS) as used in statistical learning theory. The discussion here borrows 
heavily from Cucker and Smale [5], Wahba [16], and Scholkopf and Smola 
m- Early work developing the theory of RKHS was undertaken by I.J. 
Schoenberg [HI |T3l |T1| and then N. Aronszajn [2|. Historically, RKHS 
came from the question, when it is possible to embed a metric space into a 
Hilbert space. 

Definition A.l. Let Li he a Hilbert space of functions on a set A which 
is a closed subset o/M”. Denote by {f,g) the inner product on H and let 
ll/ll = (fjy^^ be the norm in H, for f and g ^ H. We say that H is a 
reproducing kernel Hilbert space (RKHS) if there exists K : A x A such 
that 

i. K has the reproducing property, i.e., f{x) = {f{-),K{-,x)) for all 

f^n. 

a. K spans H, i.e., H = span{K{x, ■)\x € A}. 

K will be called a reproducing kernel ofH and Hk will denote the RKHS 
H with reproducing kernel K. 
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Definition A. 2. Given a kernel A : A x A —)■ M and inputs xi, - ■ ■ , G A, 
the n X n matrix 

k := {K{xi,Xj))ij, (70) 

is called the Gram Matrix of k with respect to xi, ■ ■ ■ ,Xn- If for all n G N 
and distinct Xj G A the kernel K gives rise to a strictly positive definite 
Gram matrix, it is called strictly positive definite. 

Definition A. 3. (Mercer kernel map) A function A : A x A —)• M is called 
a Mercer kernel if it is continuous, symmetric and positive definite. 

The important properties of reproducing kernels are summarized in the 
following proposition. 

Proposition A.l. If K is a reproducing kernel of a Hilbert space H, then 
i. K{x,y) is unique. 

a. For all x,y G X, K{x,y) = K{y,x) (symmetry). 

Hi. 3;^) > 0 forai G M and Xi G A (positive definitness). 

iv. {K{x,-),K{y,-))'u = K{x,y). 

V. The following kernels, defined on a compact domain A C M"", are 
Mercer kernels: K{x,y) = x-y~^ (Linear), K{x,y) = (l+x-y''')'^, d G 

_ I \=^—y\\^ 

N (Polynomial), K{x,y) = e 'P , u > 0 (Gaussian). 

Theorem A.l. Let K : X x X ^ W be a symmetric and positive definite 
function. Then there exists a Hilbert space of functions H defined on X 
admitting K as a reproducing Kernel. Moreover, there exists a function 
$ : A —> A such that 

K{x,y) = {^{x),^{y))n for x,yGX. (71) 

<f> is called a feature map. 

Gonversely, let H be a Hilbert space of functions / : A —> M, with X 
compact, satisfying 

For all X G X there is Kx > 0, such that |/(x)| < Kx||/||'H- (72) 

Then H has a reproducing kernel A. 

Remarks. 
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i. The dimension of the RKHS can be infinite and corresponds to the 

dimension of the eigenspace of the integral operator Lk '■ t 

C{X) defined as {LKf){x) = J K{x, t)f{t)dv{t) if K is a Mercer kernel, 
for / G C^{X') and v a Borel measure on X. 

ii. In Theorem lA.ll and using property [iv.] in Proposition lA.ll we can 

take $(x) := '■= K{x, •) in which case F = %- the “feature space” 

is the RKHS. This is called the canonical feature map. 

hi. The fact that Mercer kernels are positive definite and symmetric shows 
that kernels can be viewed as generalized Gramians and covariance 
matrices. 

iv. In practice, we choose a Mercer kernel, such as the ones in [v.] in 
Proposition lA.il and Theorem lA.il that guarantees the existence of a 
Hilbert space admitting such a function as a reproducing kernel. 


<1 

RKHS play an important role in learning theory whose objective is to 


find an unknown function 




f* :X 

(73) 

from random samples 




s = (xi,yi)|”Li, 

(74) 


In the following we review results from |ir)] (for a more general setting, cf. 
M) in the special case when the data samples s are such that the following 
assumption holds. 

Assumption 1: The samples in (I74p have the special form 

<5: s = {x,yx)\x&x, (75) 

where x = {xi}\f^l and yx is drawn at random from f*{x) + px, where px 
is drawn from a probability measure px- 

Here for each x £ X, px is a probability measure with zero mean, and 
its variance satisfies := YIxgx^x < oo- Let X be a closed subset of 
and 7 C X is a discrete subset. Now, consider a kernel K : X x X 
and define a matrix (possibly infinite) Kn : 7^(7) —>■ 7^(7) as 

{Kifa)s = '^K{s,t)at, s £t,a£ f{t), (76) 
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where is the set of sequences a = {at)t£i ■ t with (o, b) = 
dehning an inner product. For example, we can take X = M and t = 
,d}. 

In the case of a linear dynamical system dll), we are interested in learning 
the map x{k) !->■ x{k + 1). Here we can apply the following results. 

The problem to approximate a function f* G T-Lk from samples s of the 
form (1711) has been studied in mm- It is reformulated as the minimization 
problem 

/s,^ := argmin/6H^^-| J^(/(x) - + 7ll/lll:|, (77) 

where 7 > 0 is a regularization parameter. Moreover,when x is not dehned 
by a uniform grid on X, the authors of |10] introduced a weighting w := 
{wx}xGx on X with Wx > (0. Let be the diagonal matrix with diagonal 
entries {wx}x£x- Then, ||T>^|| < ||'w||oo- 

In this case, the regularization scheme (I77p becomes 

/s .7 := argmin/e^^jj ^'Wa,(/(a;) - + 7||/II^|, (78) 

Theorem A.2. Assume f* G Hxi and the standing hypotheses with X, 
K, i, p as above, y as in Suppose KixDwKx^i + is invertible. 

Define C to be the linear operator L = {K^ xDwKx xDy^. Then 

problem has the unique solution 

/s,^ = Y.{^y)tKt (79) 

Assumption 2‘. For each x G X, pa, is a probability measure with zero 
mean supported on [—Mx,Mx] with Bw ■= (Z)xgs^ 

The next theorems give estimates for the different sources of errors. 

Theorem A.3. (Sample Error) \1(A Theorem 4, Propositions 2 and 3] Let 
Assumptions 1 and 2 be satisfied, suppose that Ki xDwKx is invert¬ 
ible and let /s,^ = be the solution of (fTSP given in Theorem I A. 21 

by c = Cy. Define 

■■= {Kj^xD^Kxpt + lKj^tr^Ki^xDll^ 

K := ||X,-f|| \\{Kt,xD^Kx,t + 7Kt^i)-Y- 

suggestion in |10| is to consider the px—volnme of the Voronoi cell associated with 
X. Another example is w = 1 or if |a;| = m < 00 , w — 
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Then for every 0 < <5 < 1, with probability at least 1 — S we have the sample 
error estimate 


/s,7 


-fi 


*i7 


|2 

\k 


< £samp 


:= Ka,„a 


-1 ( 11^11 


*^W 




h^cr:, 


log 


(80) 


where a{u) := {u — l)logu for u > 1. In particular, £samp 0 when 
7 —>■ oo or cj^ —)• 0. 


Theorem A.4. (Regularization Error and Integration Error) ]m. Propo¬ 
sition 4 and Theorem 5] Let Assumptions 1 and 2 be satisfied and let 
X = {Xx)xGx be the Voronoi cell of X associated with x and Wx = px{Xx)- 
Define the Lipschitz norm on a subset X' C X as ||/|lxip(x') II/IIl°°(X') + 




||s —'U||^oo(]jn) 

Lipschitz space satisfied 


and assume that the inclusion map of TLx t i'^to the 


Cx 


sup - 

f&HKrt 


\\f\?K 


< oo. 


(81) 


Suppose that x is A—dense in X, i.e., for each y & X there is some x € x 
satisfying ||x - yWtooQ^n^ < A. 

Then for f* € 


ii/x,7-riP<iiriil^(7+8c^A) 


(82) 


Theorem A.5. (Sample, Regularization and Integration Errors) ]1(K Corol¬ 
lary 5] Under the assumptions of Theorem,s \A.^ and \A.4\ let X = {Xx)x&x 
he the Voronoi cell of X associated with x and Wx = Px{Xx)- Suppose that 
X is A—dense, Cx < oo, and f* € RkL- Then, for every 0 < <5 < 1, with 
probability at least 1 — 5 there holds 


ll/s,7 — /*|P < ‘ICxSsamp + 2||/*|||i'(7 + 8C'sA), 


where Ssamp is given in (E^)- 


(83) 
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^This assumption is true if X is compact and the inclusion map of Hk,! into the space 
of Lipschitz functions on X is bounded which is the case when K is a, Mercer kernel 
[18]. In fact, if ||/||Lip(^) < Co||/||if for each / G Rku, then < Copx{X). 
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